update Simpleaf modules, subworkflow #424

DongzeHE · 2025-01-22T17:41:10Z

Reopen #361 after updating simpleaf central modules. See this PR. I have tested using a 10x 500 dataset. Once the modules' PR is merged, we can start merging this PR

PR checklist

Dev

nf-core-bot · 2025-01-22T17:41:54Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.1.1.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

grst

Few minor things

core.1739377

modules/local/alevinqc.nf

modules/local/mtx_to_h5ad.nf

nextflow.config

subworkflows/local/simpleaf.nf

workflows/scrnaseq.nf

subworkflows/local/simpleaf.nf

Dev

an-altosian · 2025-02-06T00:41:48Z

Hi @grst ,

I think I am pretty happy with the code now. Interestingly, although I did not touch the code for other aligners, all CI tests except those for simpleaf failed.

I tested my changes locally and everything worked. We can discuss linting and the output structure now.

The current output layout is as the following. The biggest change is I removed the alevinqc folder and exported the alevinqc report to the directory of each sample.

`-- simpleaf
    |-- Sample_X
    |   |-- simpleaf_qc_report_Sample_X.html
    |   |-- simpleaf_quant
    |   |   |-- af_map
    |   |   |   |-- map.rad
    |   |   |   |-- map_info.json
    |   |   |   `-- unmapped_bc_count.bin
    |   |   `-- af_quant
    |   |       |-- alevin
    |   |       |   |-- quants.h5ad
    |   |       |   |-- quants_mat.mtx
    |   |       |   |-- quants_mat_cols.txt
    |   |       |   `-- quants_mat_rows.txt
    |   |       |-- collate.json
    |   |       |-- featureDump.txt
    |   |       |-- generate_permit_list.json
    |   |       |-- map.collated.rad
    |   |       |-- permit_freq.bin
    |   |       |-- permit_map.bin
    |   |       |-- quant.json
    |   |       `-- unmapped_bc_count_collated.bin
    |   `-- versions.yml
    |-- Sample_Y
    |   |-- simpleaf_qc_report_Sample_Y.html
    |   |-- simpleaf_quant
    |   |   |-- af_map
    |   |   |   |-- map.rad
    |   |   |   |-- map_info.json
    |   |   |   `-- unmapped_bc_count.bin
    |   |   `-- af_quant
    |   |       |-- alevin
    |   |       |   |-- quants.h5ad
    |   |       |   |-- quants_mat.mtx
    |   |       |   |-- quants_mat_cols.txt
    |   |       |   `-- quants_mat_rows.txt
    |   |       |-- collate.json
    |   |       |-- featureDump.txt
    |   |       |-- generate_permit_list.json
    |   |       |-- map.collated.rad
    |   |       |-- permit_freq.bin
    |   |       |-- permit_map.bin
    |   |       |-- quant.json
    |   |       `-- unmapped_bc_count_collated.bin
    |   `-- versions.yml
    `-- mtx_conversions
        |-- Sample_X
        |   |-- Sample_X_raw_matrix.h5ad
        |   |-- Sample_X_raw_matrix.sce.rds
        |   `-- Sample_X_raw_matrix.seurat.rds
        |-- Sample_Y
        |   |-- Sample_Y_raw_matrix.h5ad
        |   |-- Sample_Y_raw_matrix.sce.rds
        |   `-- Sample_Y_raw_matrix.seurat.rds
        |-- combined_raw_matrix.h5ad
        |-- combined_raw_matrix.sce.rds
        `-- combined_raw_matrix.seurat.rds

Please let me know what you think! We are getting close!

grst

few final minor things. Happy to merge once those are addressed!

Also, please update the CHANGELOG :)

conf/modules.config

grst · 2025-02-10T08:40:57Z

modules/local/templates/mtx_to_h5ad_simpleaf.py

+    if "gene_symbol" in adata.var.columns:
+        adata.var['gene_ids'] = adata.var['gene_symbol']
+    else:
+        adata.var['gene_ids'] = adata.var['gene_id']
+
+    adata.var['gene_versions'] = adata.var['gene_ids']


I don't know how the anndata generated by simpleaf looks like, so just commenting to be sure we are on the same page. For consistency across all aligners, in scrnaseq, we expect

adata.var_names are always (ensembl) gene ids without version suffix.

adata.var["gene_symbol"] contains human-readable gene symbols/names. They don't need to be unique

adata.var["gene_versions"] may contain ENSG IDs including the gene version.

adata.var can contain arbitrary other columns, but I'd avoid redundancies. E.g. we don't need gene_id and gene_ids and the same in the index, just get rid of the redundant columns in that case.

modules/local/alevinqc.nf

Co-authored-by: Gregor Sturm <[email protected]>

an-altosian · 2025-02-11T01:21:36Z

I think I addressed all your comments. As this is my first PR to scrnaseq and I made many major changes, before we merge it, can we invite more reviewers? It will be great if other maintainers can go through the changes, especially the document part.

Thanks,
Dongze

grst · 2025-02-11T07:17:25Z

Thanks for the updates!
Sure, let's first try @nictru!

fmalmeida · 2025-02-11T07:20:08Z

I cannot promise, but I can try to find some time to review.
And, many thanks for the PR, @DongzeHE

fmalmeida

Hi there,
Thanks for the work on it.

I have added a few sincere questions and some comments for changes (if y'all agree) :)

conf/modules.config

modules/local/alevinqc.nf

nextflow.config

subworkflows/local/simpleaf.nf

fmalmeida · 2025-02-11T07:49:34Z

tests/main_pipeline_alevin.nf.test

+                {assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants_mat.mtx" ).exists()},
+                {assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_X/simpleaf_quant/af_quant/alevin/quants.h5ad" ).exists()},
+                {assert new File( "${outputDir}/results_simpleaf/simpleaf/mtx_conversions/Sample_Y/Sample_Y_raw_matrix.h5ad" ).exists()},
+                {assert new File( "${outputDir}/results_simpleaf/simpleaf/Sample_Y/simpleaf_quant/af_quant/alevin/quants_mat_cols.txt" ).exists()},


is any of these alevin files now possible to have as snaps with the new version?
I remember before they could vary the sorting but not sure about the newest version.

For mtx and its column and row names, unfortunately they can still vary because of parallelization. For the h5ad file, what I can do is sorting it in the mtx_to_h5ad module to make the order fixed. Do you think it is necessary?

Hmm. Generally, I do not think is necessary. But, if it would be possible to have it being part of the snaps, it would surely add robustness.

@grst , do you think this should be here or is the PR already big enough and better to have another?

I mean having a snapshot is obviously better than not having one, but I won't insist on it.

workflows/scrnaseq.nf

nictru

Great job in general - just a few remarks :)

subworkflows/local/simpleaf.nf

nextflow.config

an-altosian · 2025-02-12T04:15:13Z

OK I think I have address all comments but two:

Exposing other cell filtering strategies: thread
testing: thread

For 1, I can add two parameters and implement the logic, not a big deal.

For 2, I realized that I did not change the test file name so simpleaf was not tested. I ran tests locally, everything worked well, but a weird bug jumped out in GitHub Actions:

Local test log

nf-test test tests/main_pipeline_simpleaf.nf.test --ci

? nf-test 0.9.2
https://www.nf-test.com
(c) 2021 - 2024 Lukas Forer and Sebastian Schoenherr

nf-test runs in CI mode.

Test Workflow main.nf

  Test [bd8e613a] 'test-dataset_simpleaf_aligner' PASSED (124.819s)


SUCCESS: Executed 1 tests in 125.814s

GitHub Actions workflow error:
https://github.com/nf-core/scrnaseq/actions/runs/13277650102/job/37070051243?pr=424#step:9:199

I could not reproduce this error locally. Any suggestions how I can address it? Could you download the artifact of the failed job so that I can jump into it?

an-altosian · 2025-02-12T05:15:34Z

So it turns out that the error comes from this line in simpleaf, caused by the the internal mtx to h5ad conversion (this line) where pola-rs encountered an empty featureDump.txt (this line).

It is strange because this file, generated by alevin-fry in this line, should at least have its header. The logic here is, Simpleaf first asks alevin-fry to generate this file, then read this file and add it into the h5ad output. So, this file should definitely be there.

In some runs, this file was there but I got different md5sums of sorted h5ad files. This also doesn't make sense because, although the columns and rows can swap, the counts of a specific gene in a specific cell should be consistent.

@rob-p - Do you have any idea what happened here?

fmalmeida · 2025-02-12T08:12:12Z

Super. I do not think I have any other comment, besides the one in the nf-tests which would be good to have but not necessary.
So I am now approving.
Thanks for the great work here 😄

an-altosian and others added 4 commits January 20, 2025 11:34

update simpleaf subworkflow

fec1962

adopt new simpleaf modules

c6b87c4

tested changes

f3e2977

Merge pull request #1 from an-altosian/dev

f5863df

Dev

DongzeHE requested a review from fmalmeida January 22, 2025 17:41

DongzeHE requested a review from grst January 22, 2025 17:43

Merge branch 'dev' into dev

90cc34e

grst requested changes Jan 22, 2025

View reviewed changes

an-altosian and others added 16 commits January 23, 2025 02:20

adopt new t2g format in simpleaf index out

bec0bfa

Merge branch 'DongzeHE:dev' into dev

bc5df5e

fix typos

0e4c242

Merge branch 'dev' of https://github.com/an-altosian/scrnaseq into dev

f82884b

fix typos

2513ef0

update doc

6f3834b

avoid using channel.of

afcc3a9

Merge pull request #2 from an-altosian/dev

afd4d5b

Dev

back compatibility

2bd54ea

Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev

7301865

rewrite mtx_to_h5ad_simpleaf to be aware of USA mode

5ad1727

rewrite mtx_to_h5ad_simpleaf to be aware of USA mode

1b6dc9b

update doc

9bc8e03

fix bug

cc27d13

lint

7ac2aa4

Merge branch 'nf-core:dev' into dev

179adf8

an-altosian added 4 commits February 8, 2025 17:11

prettier

025c771

Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev

3db96cb

switch module dir

8b9c5a6

manually update nf-core-scrnaseq_logo_light.png

65ff578

DongzeHE marked this pull request as ready for review February 8, 2025 17:38

grst mentioned this pull request Feb 10, 2025

fix bugs in simpleleaf_index.nf and config files #429

Closed

11 tasks

Merge branch 'dev' into dev

bc7ea31

grst reviewed Feb 10, 2025

View reviewed changes

an-altosian and others added 5 commits February 10, 2025 09:00

Update conf/modules.config

ccb5c87

Co-authored-by: Gregor Sturm <[email protected]>

use gene id as var_name in h5ad

960db59

Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev

66949a6

remove local simpleaf modules

a8fb237

Merge branch 'dev' into dev

d212f43

grst requested a review from nictru February 11, 2025 07:17

fmalmeida requested changes Feb 11, 2025

View reviewed changes

nictru reviewed Feb 11, 2025

View reviewed changes

subworkflows/local/simpleaf.nf Outdated Show resolved Hide resolved

subworkflows/local/simpleaf.nf Outdated Show resolved Hide resolved

nextflow.config Show resolved Hide resolved

an-altosian added 9 commits February 11, 2025 18:32

addess commenets

ef240e2

Merge branch 'dev' of https://github.com/DongzeHE/scrnaseq into dev

bf8ee23

fix bug for existing index dir

9ddeb87

minor typos

cbdb32b

make sure txp2gene is channel

0ba727e

make sure txp2gene is channel

745e9f6

comprehensive testign

525e866

comprehensive testign

d0444fb

comprehensive testign

d97edc1

fmalmeida approved these changes Feb 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update Simpleaf modules, subworkflow #424

update Simpleaf modules, subworkflow #424

DongzeHE commented Jan 22, 2025 •

edited by grst

Loading

nf-core-bot commented Jan 22, 2025

grst left a comment

an-altosian commented Feb 6, 2025

grst left a comment

grst Feb 10, 2025

an-altosian commented Feb 11, 2025

grst commented Feb 11, 2025

fmalmeida commented Feb 11, 2025

fmalmeida left a comment

fmalmeida Feb 11, 2025

an-altosian Feb 11, 2025

fmalmeida Feb 11, 2025

grst Feb 11, 2025

nictru left a comment

an-altosian commented Feb 12, 2025

an-altosian commented Feb 12, 2025 •

edited

Loading

fmalmeida commented Feb 12, 2025

update Simpleaf modules, subworkflow #424

Are you sure you want to change the base?

update Simpleaf modules, subworkflow #424

Conversation

DongzeHE commented Jan 22, 2025 • edited by grst Loading

PR checklist

nf-core-bot commented Jan 22, 2025

grst left a comment

Choose a reason for hiding this comment

an-altosian commented Feb 6, 2025

grst left a comment

Choose a reason for hiding this comment

grst Feb 10, 2025

Choose a reason for hiding this comment

an-altosian commented Feb 11, 2025

grst commented Feb 11, 2025

fmalmeida commented Feb 11, 2025

fmalmeida left a comment

Choose a reason for hiding this comment

fmalmeida Feb 11, 2025

Choose a reason for hiding this comment

an-altosian Feb 11, 2025

Choose a reason for hiding this comment

fmalmeida Feb 11, 2025

Choose a reason for hiding this comment

grst Feb 11, 2025

Choose a reason for hiding this comment

nictru left a comment

Choose a reason for hiding this comment

an-altosian commented Feb 12, 2025

an-altosian commented Feb 12, 2025 • edited Loading

fmalmeida commented Feb 12, 2025

DongzeHE commented Jan 22, 2025 •

edited by grst

Loading

an-altosian commented Feb 12, 2025 •

edited

Loading